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Abstract 

Although the likelihood function is normalizeable with respect to the data there is no guarantee 
that the same holds with respect to the model parameters. This may lead to singularities in the 
expectation value integral of these parameters, especially if the prior information is not sufficient 
to take care of finite integral values. However, the problem may be solved by obeying the correct 
Riemannian metric imposed by the likelihood. This will be demonstrated for the example of the 
electron temperature evaluation in hydrogen plasmas. 
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I. INTRODUCTION 



Given data d, a linear parameter c and some function / meant to explain the data, we 
have 

d = C-f(T)+£ . (1) 

The vectors shall have dimension N according to the number of quantities measured. Due 
to the measurement process the data is corrupted by noise, where (e) = and (e 2 ) = a 2 . 
Then by the principle of Maximum Entropy the likelihood function reads 



p(D\c,a,f,I) oc exp j-^X^ ~ c ^ 2 | 



(2) 



which is clearly normalizeable for the data d and bound for every parameter showing up 
as a functional dependency in /. The situation may change when we are looking for the 
expectation value of some parameter of /, let say / = f(T). Then we need to evaluate the 
posterior of T with 

(T)oc J Tp(dT\D,I) . (3) 

In order to connect the unknown posterior to the known likelihood we marginalize over all 
the parameters which enter the problem, that is in our problem c and a: 

p(dT\D,I) = J J p(dT,dc,da\D,I) , (4) 

and make use of Bayes theorem: 

p(T, c, a\D, I) oc p(D\T, c, a, /) p(T, c, a\I) . (5) 

Commonly, the infinitesimal elements in equation (§) are identified with 

p(dT,dc,da\D,I) = p(T,c,a\D, I) dT dc da . (6) 

In mathematical terms this would mean that the probability functions live in euclidean 
space. They do not. 



II. RIEMANNIAN METRIC 

Parameterizations correspond to choices of coordinate systems. The problem to be solved 
has to be invariant against reparametrizations i.e. in the space of the probability functions 



one has to get the same answer no matter what parameters were chosen to describe a model. 
Therefore one is in need of a length measure [i which takes care of defining a distance between 
different elements of this probability function space. This task is done by applying differential 
geometry to statistical models, an approach which was baptized 'information geometry' by 
S. Amari 0. Eq. @ then reads correctly 



p(dT, dc, da\D, I) = p(T, c, a\D, I) /x(dT, dc, da) 



(7) 



fi(dd) = p,{9)d6 is the natural Riemannian metric on a regular model (in our case the model 
is parameterized by 9 = (T, c, a)). It results from second variations of the entropy @, [| and 
is given by 

Ii{d6) = ^detg(9)d9 (8) 
where g is the Fisher information matrix: 

2 logp(D\e,I) 
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(9) 



For the above likelihood the metric reads explicitly 



n(a,c,T) oc -=■ 



cr 
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(10) 



Notice that this approach is based on the assumption that the hypothesis space of the 
likelihood defines the metric to be calculated in. This may not be the case if some prior 
information was already used during data acquisition, e.g. the experimentalist uses his expert 
knowledge in separating 'correct' data from the rest. The latter instantly rules out certain 
parts of all possible realizations of the likelihood function and results in a different hypothesis 
space. 



III. SIMPLE EXAMPLE 



First we want to demonstrate the relevance of using the correct metric with a simple 
example which already has all the features of the real world problem further down. 

fi{T) =T-(T + x l )- 1 x l , (11) 

where the notation in i corresponds to the data points dj. For simplification let us assume 
that the variance a 2 is known and we only have to marginalize over c in order to get the 



posterior. What happens if we do not use the Riemannian metric? Then the marginalization 
integral over c reads 

p(T\D, I) oc J dc p(D\T, c, I) p(c\I) . (12) 

In order to facilitate analytic calculation the exponent of the likelihood is written in a 
quadratic form over c 



(Fd - 



f T f 



(13) 



where c = c?/* / /^/*. For the prior p(c\I) the only thing we know is that c will be something 
in between an upper and a lower limit, where it is reasonable to assume that the upper (lower) 
bound is given by an unknown factor n (1/n) of the value Co where the maximum of the 
likelihood occurs. The principle of maximum entropy gives a flat prior with 



p(c\I) 



-i- V < c < nc 
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(14) 



The integral over the c-dependent parts then reads 



^r dcexp {-i </T/)ic - c ° |2 } 



(15) 



One may check that for j 1 " f 3> o 2 it is allowed to shift the integral boundaries to +/— 
infinity with affecting the value of the integrand up to a small error only. As a matter of 
fact for the chosen model parameters of iV=3, Xi=i, T—l, c=l and <x=0.1 the error is in 
the order of 10~ 7 of the correct integral. Notice that this is almost the same for every T in 
between and infinity. We finally get 



p(T|D,/)« J^exp 
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(16) 



A look at the behavior for large and small T gives 



hmp(T|A/) oc ^ 
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(17) 



\imp(T\D,I) oc -y—exp 
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oc const 
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(18) 




FIG. 1: Posterior p(T\D,I) with (solid line) and without (dashed line) the Riemannian metric. 
Neglection produces non- vanishing tails. 

Though one has no problem with the lower limit since the integrand is regular, the non- 
vanishing posterior distribution for T — - > oo leads to an expectation value which depends on 
where the integration limits are set (see Fig. |l|). 

Now we implement in the calculation the Riemannian metric. From Eq. (|10D we get an 
additional factor c, so the integration over the c-dependent parts changes to 
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(19) 



Again it is allowed to extend the integration limits to +/— infinity with only minor error. 
The full posterior then gives 



p(T\D, I) oc exp 
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/ /(fTf). (20) 



What is now the behavior of p(T\D, I) for T approaching and infinity? The exponent in 
Eq. ( |20"D was already examined in Eqn. (O) and (^) to become constant, so we only have 
to look at the square root. 
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(21) 



(22) 



So indeed the square root term which stems from the metric does take care of zero tails in 
the posterior! The nice decrease towards is shown in Fig. [1| by the solid line. 



IV. REAL WORLD PROBLEM 

In the problem of determining the electron temperature in an hydrogen plasma heated 
by electron cyclotron resonance, the model function T depends in a quite complicated way 
on the temperature T: 

f(T) = -V{R-V)- 1 x . (23) 

Both V and R are matrices, but only the diagonal matrix V depends on T with entries on 
the diagonal: 

Vu = , (24) 

where and foj are constants with respect to ion species i. Since the sensitivity of the 
measurement apparatus is unknown one has to introduce a linear parameter c in order to 
relate the data to the model, i.e. Eq. ([!]). Contrary to our simple problem we are not so 
fortunate to know the variance a exactly. The experimentalist can only provide an estimate 
s of the true errors a with respect to each other but not on the total scale, so that we have 
to introduce an overall multiplication factor u, with <jj = uosi. In order to assign a prior to 
u the outlier tolerant approach |3| was chosen: 

pMa , 7 /) = 2 i ^(i) 27 exp{-^}I . (25) 

The expectation value of uj should be one, since the experimentalist does his estimation 
according to his best knowledge. Furthermore, from the characteristics of the measurement 
process one can tell that the best guess of s should not deviate by more than 50% from the 
true a. This results in a = 1.28 and 7 = 2.0076. 

Now we follow the route explained above to evaluate the expectation value of T. Again 
we start by marginalizing c (with the flat prior of Eq. (|T4])) and to without making use of 
the Riemannian metric. This gives the posterior in T 



p (t\d,i)k jl|_Z 
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(26) 



d f L V / / 

For simplicity of notation the hat shall denote that the values have been divided by the 




FIG. 2: Posterior p(T\D,I) with (solid line) and without (dashed line) the Riemannian metric 
(dotted line). The incision at T = 118.43 K is a single point which is due to the parameterization 
of the physical model. It does not affect the integrability. 



estimated error s: di = dj/sj. The posterior is displayed in Fig. |^. Here we have to face the 
problem we observed above in the simple example. Though a non-vanishing tail for T — > 
is not so harmful, the increase with T — > oo results in a divergence. 

Help comes by obeying the correct Riemannian metric. Then the posterior reads 



p(T\D,I)cxfi(T) 



T -> 
f f 
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(27) 



where /x(T) is just the metric of Eq. ( |T0D without the terms in c and u (marginalized over). 
The situation changes completely (see Fig. |^) and the integral becomes feasible now. 



V. CONCLUSION 



The correct mathematical way to deal with marginalization integrals is to use the Rie- 
mannian metric. This invariant measure takes care of defining correct infinitesimal elements 
to be integrated over. Since parameterizations of a model may be subjective and vary with 
the investigator of a problem, this is the only consistent way to get comparable answers in 
probability space. 
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